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AP® Environmental Science Background 


• Course and exam launched in 1998 

• 5 AP score categories (requires 4 cut scores) 

• Cut scores initially established with a college 
comparability study 

• Recently, AP program is moving towards conducting 
panel-based standard setting in place of college 
comparability 

• In June 201 1 , first AP standard setting was 
conducted 
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Importance of gathering validity evidence 
for standard setting procedures 


• Types of validity evidence 

• Procedural validity 

• Internal validity 

• External validity 

• How have judgments from standard settings been 
evaluated? 

• Utilize g-theory (e.g., Brennan, 1995) 

• Many-facet Rasch model (Engelhard, 2011) 


(Cizek & Bunch, 2007; Hambleton & Pitoniak, 2006; Kane, 2001) 
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Many Facet Rasch Model 


• Focus is on standard setting judgments rather 
than scores a rater- mediated assessment 

^ i^nijk ^ ^nijk- 1 1 ~ ~ ~ ^ j ~ 

• Variability in ratings is a function of specified 
facets (e.g., Panelist severity, Judged item 
difficulty, Judged average performance level) 

• MFR Model provides: 

• Rating quality indices 

• Model-data fit 

• Display of the facets on a variable map 
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Purpose of study 


Use the MFR Model to evaluate quality of standard 
setting judgments from AP Environmental 
Science standard setting, specifically: 

i . What are the locations of the panelists, items, 
rounds, and performance standards on the 
construct being measured (i.e., AP 
Environmental Science)? 


2. Do panelists characteristics of gender of level 
of course taught (high school or college) 
influence their conceptualization of the 


underlying construct? 
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Methods 


• 15 panelists were Environmental Science SMEs 

• APES exam: 100 MCQs, 4 FRQs 

• We only focused on MCQ item analyses 

• Multiple Yes/No standard setting procedure 

• Data Analysis 

• Applied MFR Model to analyze panelist judgments 
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Borderline Examinees for APES 
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Rating Task for Panelists 


• Should the Borderline AP 2 Examinee answer the item 
correctly? 

• Yes, circle 1/2 on the rating form 

• No, Read the Borderline PLD for AP 3 

• Should the Borderline AP 3 Examinee answer the item 
correctly? 

• Yes, circle 2/3 on the rating form 

• No, Read the Borderline PLD for AP 4 

• Question A 1 /2 (2/3) 3/4 4/5 Above 5 Cut 
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Results Research Question 1 : Panelists 


Panelist 

Panelist 

Severity 

Measure 

Mean 

Rating 

SEM 

INFIT 

OUTFIT 

3 

0.17 

4.02 

0.09 

1.07 

1.03 

6 

0.17 

4.03 

0.09 

1.43 

1.42 

11 

0.14 

4.00 

0.09 

1.02 

1.00 

2 

0.09 

3.96 

0.09 

1.07 

1.09 

1 

-0.13 

3.82 

0.09 

0.73 

0.71 

9 

-0.18 

3.78 

0.09 

0.81 

0.83 

15 

-0.29 

3.71 

0.09 

0.74 

0.77 

4 

-0.32 

3.69 

0.09 

1.07 

1.08 

5 

-0.35 

3.67 

0.09 

1.38 

1.35 

12 

-0.47 

3.60 

0.09 

1.31 

1.26 

7 

-0.50 

3.58 

0.09 

0.84 

0.84 

13 

-0.55 

3.55 

0.09 

1.01 

1.01 

14 

-0.58 

3.53 

0.09 

0.73 

0.81 

10 
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_____ 083 

1 

-1.38 

3.08 

0.10 

0.65 
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Panelist 8 residual plot 
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Results Research Question 1 : Items 


Item 


Item 

Difficulty 

Measure 

Mean 

Rating 

65 

2.14 

5.23 

70 

2.09 

5.19 

33 

1.79 

4.98 

84 

1.66 

4.88 

40 

1.48 

4.74 

53 

1.48 

4.74 

8 

1.44 

4.70 

51 

1.27 

4.56 

86 

1.27 

4.56 

75 

1.23 

4.53 

9 

1.19 

4.49 


S.E. 

Infit 

MSE 

Outfit MSE 

0.23 

0.91 

0.85 

0.23 

0.76 

0.75 

0.22 

0.91 

0.91 

0.21 

0.73 

0.72 

0.21 

0.90 

0.92 

0.21 

0.57 

0.58 

0.21 

1.08 

1.10 

0.20 

0.76 

0.74 

0.20 

0.30 

0.30 

0.20 

0.83 

0.82 

0.20 

0.98 

0.97 
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Relationship between observed and 
judged item difficulties 



r = 0.54 
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Results Research Question 1 : Rounds 


Round 

Measure 

S.E. 

Infit MSE 

Outfit MSE 

1 

0.16 

0.03 

1.07 

1.06 

2 

- 0.16 

0.03 

0.91 

0.91 

Mean 

0.00 

0.03 

0.99 

0.98 

SD 

0.03 

0.00 

0.08 

0.07 
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Results Research Question 1 
Performance Standards 


Category 

Count 

Percentage 

Mean OUTFIT 

Rasch Threshold 

S.E. 




Round 1 




Above 5 

71 

5 

0.69 

1.20 

1.75 

0.13 

4/5 

212 

14 

0.51 

0.80 

0.86 

0.08 

3/4 

437 

29 

-0.22 

1.10 

-0.24 

0.06 

2/3 

571 

38 

-0.89 

1.00 

- 2.37 

0.09 

1/2 

194 

13 

-1.60 

1.10 






Round 2 




Above 5 

119 

8 

1.19 

0.90 

1.50 

0.11 

4/5 

222 

15 

0.76 

0.70 

1.14 

0.07 

3/4 

494 

33 

-0.01 

0.90 

-0.29 

0.06 

2/3 

505 

34 

-0.75 

0.90 

-2.34 

0.10 

1/2 

145 

10 

-1.45 

1.10 
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Category Characteristic Functions 



-■rip I 41 i 11 1 Jl 1 H I If k H I |> f 

Htr-Ji^-LU rr CilAdvfr Ilk ijiHiiliAv 


CdllegeBoanl 

inspiring minds 1 ' 


] Log 1 t 1 +P an el i s t | + 1 tern 


| -Round | -Gender f -Level | Seal e 


f ■ 

r 


+ 

t 

i 


T J 

1 

I 

1 

+ 

1 

I 

I 

3 


I 


1 + 


-1 


~Z 


* 

r 




4 

■ 

*5 

70 








K 



h 


+ 

1 

HK 

1 







33 

















1 




1 

94 














i 

1 






■ 

& 

40 

5 3 












■ 

■ 

5 

i 




1 


3 O 

51 

75 

36 

97 









p 

1 


-t 




-1- 


3 4 

56 

69 








1- 


* 

* 






1 

79 

87 













i 

S 







26 

3 a 

5 4 

61 

91 

95 

96 

99 















*2 

57 

7S 

93 












3 







20 

49 

94 

10 0 


















1 

19 

33 

?2 

93 

93 










i 

1 



3 

ii 


l 

25 

45 

63 

66 

71 

74 









i 

j 

4 

w 

6 




33 

39 

50 

77 

63 

66 

90 


*»» 

1 

* 


* c 




1 

2 



10 

14 

19 

64 

35 




i 

2 



F 

1 HS 

1 



4 

i? 

It 


=6 

7 

12 

13 

•34 

46 



i 













£0 

■sa 







i 





l 




5 

7 

13 13 


2 

41 

47 

5E 

73 





1 







— 


1 4 



J 

■ts 














E 

1 






i 

24 

5 & 

76 











1 

1 


* 

io 




t 

13 

31 

3S 

37 

43 

69 

90 

4- 


4 


* 

* 






1 

11 

16 

2 7 

31 











1 

1 



s 



J 

17 

63 













6 

1 







*7 

Cl 

S9 














3 





1 

22 

5 9 













1 

1 



3 6 
i8 


4 

I 

I 

I 

-+- 


C6> 


CO 


1 Logl 1 1 + p an el 1 s t |+I tem 

■* — ~ — 


3 -Round | “Gender E -Level I Seal e 


Research Question 2: Gender Facet 


Gender 

Measure 

SEM 

INFIT 

OUTFIT 

Males 

0.05 

0.03 

0.88 

0.88 

Females 

-0.05 

0.04 

1.14 

1.13 

Mean 

0.00 

0.03 

1.01 

1.01 

SD 

0.05 

0.00 

0.13 

0.13 
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Research Question 2: 
Level of Course Taught 


Level 

Measure 

SE 

Infit MSE 

Outfit MSE 

College 

0.05 

0.03 

0.90 

0.90 

High School 

-0.05 

0.04 

1.11 

1.11 

Mean 

0.00 

0.03 

1.01 

1.00 

SD 

0.05 

0.00 

0.10 

0.11 
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Discussion 


• Benefits of utilizing MFR Model for evaluating 
standard setting ratings: 

• Holistic depiction on variable map 

• Panelist and item-specific residuals 

• Can incorporate explanatory variables 

• Validity evidence, both internal and procedural 

• Provided evidence of acceptable quality of ratings 
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THANK YOU! 
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